A Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees
نویسندگان
چکیده
We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. The main advantage of succinct indexes as opposed to succinct (integrated data/index) encodings is that we make assumptions only on the ADT through which the main data is accessed, rather than the way in which the data is encoded. This allows more freedom in the encoding of the main data. In this paper, we present succinct indexes for various data types, namely strings, binary relations and multi-labeled trees. Given the support for the interface of the ADTs of these data types, we can support various useful operations efficiently by constructing succinct indexes for them. When the operators in the ADTs are supported in constant time, our results are comparable to previous results, while allowing more flexibility in the encoding of the given data. Using our techniques, we design a succinct encoding that represents a string of length n over an alphabet of size σ using nHk(S)+lg σ ·o(n)+O( n lg σ lg lg lg σ ) bits to support access/rank/select operations in o((lg lg σ) ) time, for any fixed constant > 0. We also design a succinct text index using nH0(S) +O( n lg σ lg lg σ ) bits that supports finding all the occ occurrences of a given pattern of length m in O(m lg lg σ+ occ lgn/ lg σ) time, for any fixed constant > 0. Previous results on these two problems either have a lg σ factor instead of lg lg σ in the running time, or are not compressed. Finally, we present succinct encodings of binary relations and multi-labeled trees that are more compact than previous structures.
منابع مشابه
Succinct Indexes
This thesis defines and designs succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. As opposed to succinct (integrat...
متن کاملSuccinct Encodings for XPath Location Steps
We consider in this paper the problem of encoding XML documents in small space while still supporting XPath Location steps efficiently. We model XML documents as multi-labeled trees, and propose for those an encoding which takes space close to the lower bound suggested by information theory, while still supporting the search for the ancestors, descendants and children matching a given label eff...
متن کاملSuccinct Encoding of Permutations and its Applications to Text Indexing (2003; Munro, Raman, Raman, Rao)
A succinct data structure for a given data type is a representation of the underlying combinatorial object that uses an amount of space “close” to the information theoretic lower bound, together with algorithms that supports operations of the data type “quickly”. A natural example is the representation of a binary tree [5]: an arbitrary binary tree on n nodes can be represented in 2n + o(n) bit...
متن کاملSuccinct Online Dictionary Matching with Improved Worst-Case Guarantees
In the online dictionary matching problem the goal is to preprocess a set of patterns D = {P1, . . . , Pd} over alphabet Σ, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching pro...
متن کاملProbabilistic analysis of the asymmetric digital search trees
In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...
متن کامل